The pandas library provides many ways to filter, select, reshape and modify a data frame. Pandas supports as well method chaining, which means that
many DataFrame
methods return a modified DataFrame
. This allows the user to chain multiple operations together, making it
effortless perform several of them in one line of code:
import pandas as pd
schema = {'name':str, 'domain': str, 'revenue': 'Int64'}
joe = pd.read_csv("data.csv", dtype=schema).set_index('name').filter(like='joe', axis=0).groupby('domain').mean().round().sample()
While this code is correct and concise, it can be challenging to follow its logic and flow, making it harder to debug or modify in the future.
To improve code readability, debugging, and maintainability, it is recommended to break down long chains of pandas instructions into smaller, more
modular steps. This can be done with the help of the pandas pipe
method, which takes a function as a parameter. This function takes the
data frame as a parameter, operates on it and returns it for further processing. Grouping complex transformations of a data frame inside a function
with a meaningful name can further enhance the readability and maintainability of the code.